{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Build a Local RAG Application\n",
        "\n",
        "The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), [GPT4All](https://github.com/nomic-ai/gpt4all), and [llamafile](https://github.com/Mozilla-Ocho/llamafile) underscore the importance of running LLMs locally.\n",
        "\n",
        "LangChain has integrations with many open-source LLMs that can be run locally.\n",
        "\n",
        "For example, here we show how to run `OllamaEmbeddings` or `LLaMA2` locally (e.g., on your laptop) using local embeddings and a local LLM.\n",
        "\n",
        "## Document Loading \n",
        "\n",
        "First, install packages needed for local embeddings and vector storage."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "### Dependencies\n",
        "\n",
        "We’ll use the following packages:\n",
        "\n",
        "```bash\n",
        "npm install --save langchain @langchain/community cheerio\n",
        "```\n",
        "\n",
        "### LangSmith\n",
        "\n",
        "Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com/).\n",
        "\n",
        "Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:\n",
        "\n",
        "\n",
        "```bash\n",
        "export LANGCHAIN_TRACING_V2=true\n",
        "export LANGCHAIN_API_KEY=YOUR_KEY\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Initial setup\n",
        "\n",
        "Load and split an example document.\n",
        "\n",
        "We'll use a blog post on agents as an example."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import \"cheerio\";\n",
        "import { RecursiveCharacterTextSplitter } from \"langchain/text_splitter\";\n",
        "import { CheerioWebBaseLoader } from \"langchain/document_loaders/web/cheerio\";"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "146\n"
          ]
        }
      ],
      "source": [
        "const loader = new CheerioWebBaseLoader(\n",
        "  \"https://lilianweng.github.io/posts/2023-06-23-agent/\"\n",
        ");\n",
        "const docs = await loader.load();\n",
        "\n",
        "const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 500, chunkOverlap: 0 });\n",
        "const allSplits = await textSplitter.splitDocuments(docs);\n",
        "console.log(allSplits.length)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, we'll use `OllamaEmbeddings` for our local embeddings.\n",
        "Follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { OllamaEmbeddings } from \"@langchain/community/embeddings/ollama\";\n",
        "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n",
        "\n",
        "const embeddings = new OllamaEmbeddings();\n",
        "const vectorStore = await MemoryVectorStore.fromDocuments(allSplits, embeddings);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Test similarity search is working with our local embeddings."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "4\n"
          ]
        }
      ],
      "source": [
        "const question = \"What are the approaches to Task Decomposition?\";\n",
        "const docs = await vectorStore.similaritySearch(question);\n",
        "console.log(docs.length)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Model \n",
        "\n",
        "### LLaMA2\n",
        "\n",
        "For local LLMs we'll use also use `ollama`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { ChatOllama } from \"@langchain/community/chat_models/ollama\";\n",
        "\n",
        "const ollamaLlm = new ChatOllama({\n",
        "  baseUrl: \"http://localhost:11434\", // Default value\n",
        "  model: \"llama2\", // Default value\n",
        "});\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "[The stage is set for a fierce rap battle between two of the funniest men on television. Stephen Colbert and John Oliver are standing face to face, each with their own microphone and confident smirk on their face.]\n",
            "\n",
            "Stephen Colbert:\n",
            "Yo, John Oliver, I heard you've been talking smack\n",
            "About my show and my satire, saying it's all fake\n",
            "But let me tell you something, brother, I'm the real deal\n",
            "I've been making fun of politicians for years, with no conceal\n",
            "\n",
            "John Oliver:\n",
            "Oh, Stephen, you think you're so clever and smart\n",
            "But your jokes are stale and your delivery's a work of art\n",
            "You're just a pale imitation of the real deal, Jon Stewart\n",
            "I'm the one who's really making waves, while you're just a little bird\n",
            "\n",
            "Stephen Colbert:\n",
            "Well, John, I may not be as loud as you, but I'm smarter\n",
            "My satire is more subtle, and it goes right over their heads\n",
            "I'm the one who's been exposing the truth for years\n",
            "While you're just a British interloper, trying to steal the cheers\n",
            "\n",
            "John Oliver:\n",
            "Oh, Stephen, you may have your fans, but I've got the brains\n",
            "My show is more than just slapstick and silly jokes, it's got depth and gains\n",
            "I'm the one who's really making a difference, while you're just a clown\n",
            "My satire is more than just a joke, it's a call to action, and I've got the crown\n",
            "\n",
            "[The crowd cheers and chants as the two comedians continue their rap battle.]\n",
            "\n",
            "Stephen Colbert:\n",
            "You may have your fans, John, but I'm the king of satire\n",
            "I've been making fun of politicians for years, and I'm still standing tall\n",
            "My jokes are clever and smart, while yours are just plain dumb\n",
            "I'm the one who's really in control, and you're just a pretender to the throne.\n",
            "\n",
            "John Oliver:\n",
            "Oh, Stephen, you may have your moment in the sun\n",
            "But I'm the one who's really shining bright, and my star is just beginning to rise\n",
            "My satire is more than just a joke, it's a call to action, and I've got the power\n",
            "I'm the one who's really making a difference, and you're just a fleeting flower.\n",
            "\n",
            "[The crowd continues to cheer and chant as the two comedians continue their rap battle.]\n"
          ]
        }
      ],
      "source": [
        "const response = await ollamaLlm.invoke(\"Simulate a rap battle between Stephen Colbert and John Oliver\");\n",
        "console.log(response.content);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "See the LangSmith trace [here](https://smith.langchain.com/public/31c178b5-4bea-4105-88c3-7ec95325c817/r)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Using in a chain\n",
        "\n",
        "We can create a summarization chain with either model by passing in the retrieved docs and a simple prompt.\n",
        "\n",
        "It formats the prompt template using the input key values provided and passes the formatted string to `LLama-V2`, or another specified LLM."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { RunnableSequence } from \"@langchain/core/runnables\";\n",
        "import { StringOutputParser } from \"@langchain/core/output_parsers\";\n",
        "import { PromptTemplate } from \"@langchain/core/prompts\";\n",
        "import { createStuffDocumentsChain } from \"langchain/chains/combine_documents\";\n",
        "\n",
        "const prompt = PromptTemplate.fromTemplate(\"Summarize the main themes in these retrieved docs: {context}\");\n",
        "\n",
        "const chain = await createStuffDocumentsChain({\n",
        "  llm: ollamaLlm,\n",
        "  outputParser: new StringOutputParser(),\n",
        "  prompt,\n",
        "})"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"The main themes retrieved from the provided documents are:\\n\"\u001b[39m +\n",
              "  \u001b[32m\"\\n\"\u001b[39m +\n",
              "  \u001b[32m\"1. Sensory Memory: The ability to retain\"\u001b[39m... 1117 more characters"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "const question = \"What are the approaches to Task Decomposition?\";\n",
        "const docs = await vectorStore.similaritySearch(question);\n",
        "await chain.invoke({\n",
        "  context: docs,\n",
        "});"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "See the LangSmith trace [here](https://smith.langchain.com/public/47cf6c2a-3d86-4f2b-9a51-ee4663b19152/r)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Q&A \n",
        "\n",
        "We can also use the LangChain Prompt Hub to store and fetch prompts that are model-specific.\n",
        "\n",
        "Let's try with a default RAG prompt, [here](https://smith.langchain.com/hub/rlm/rag-prompt)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { pull } from \"langchain/hub\";\n",
        "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
        "\n",
        "const ragPrompt = await pull<ChatPromptTemplate>(\"rlm/rag-prompt\");\n",
        "\n",
        "const chain = await createStuffDocumentsChain({\n",
        "  llm: ollamaLlm,\n",
        "  outputParser: new StringOutputParser(),\n",
        "  prompt: ragPrompt,\n",
        "});"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's see what this prompt actually looks like:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
            "Question: {question} \n",
            "Context: {context} \n",
            "Answer:\n"
          ]
        }
      ],
      "source": [
        "console.log(ragPrompt.promptMessages.map((msg) => msg.prompt.template).join(\"\\n\"));"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"Task decomposition is a crucial step in breaking down complex problems into manageable parts for eff\"\u001b[39m... 1095 more characters"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await chain.invoke({ context: docs, question });"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "See the LangSmith trace [here](https://smith.langchain.com/public/dd3a189b-53a1-4f31-9766-244cd04ad1f7/r)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Q&A with retrieval\n",
        "\n",
        "Instead of manually passing in docs, we can automatically retrieve them from our vector store based on the user question.\n",
        "\n",
        "This will use a QA default prompt and will retrieve from the vectorDB."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"Based on the context provided, I understand that you are asking me to answer a question related to m\"\u001b[39m... 948 more characters"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import { RunnablePassthrough, RunnableSequence } from \"@langchain/core/runnables\";\n",
        "import { formatDocumentsAsString } from \"langchain/util/document\";\n",
        "\n",
        "const retriever = vectorStore.asRetriever();\n",
        "\n",
        "const qaChain = RunnableSequence.from([\n",
        "  {\n",
        "    context: (input: { question: string }, callbacks) => {\n",
        "      const retrieverAndFormatter = retriever.pipe(formatDocumentsAsString);\n",
        "      return retrieverAndFormatter.invoke(input.question, callbacks);\n",
        "    },\n",
        "    question: new RunnablePassthrough(),\n",
        "  },\n",
        "  ragPrompt,\n",
        "  ollamaLlm,\n",
        "  new StringOutputParser(),\n",
        "]);\n",
        "\n",
        "await qaChain.invoke({ question });"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "See the LangSmith trace [here](https://smith.langchain.com/public/440e65ee-0301-42cf-afc9-f09cfb52cf64/r)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Deno",
      "language": "typescript",
      "name": "deno"
    },
    "language_info": {
      "file_extension": ".ts",
      "mimetype": "text/x.typescript",
      "name": "typescript",
      "nb_converter": "script",
      "pygments_lexer": "typescript",
      "version": "5.3.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
